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ABSTRACT 

We show that a suitably defined marked correlation function can be used to break 
degeneracies in halo-occupation distribution modeling. The statistic can be computed 
on both 3D and 2D data sets, and should be applicable to all upcoming galaxy surveys. 
A proof of principle, using mock catalogs created from N-body simulations, is given. 
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1 INTRODUCTION 

In recent years our ability to describe gala xy clustering 
has advanced dramatically. Th e halo model JSeliak' '200ff; 
IPeaco ck & Smith 2000, see e.g. ICoorav fc Shet h 2002 for a 
review) has provided us with a physically informative and 
flexible means of describing galaxy bias - the relation be- 
tween galaxies and the underlying dark matter halos. The 
key insight is that an accurate prediction of galaxy clustering 
requires knowledge of how galaxies are apportioned between 
and distributed within halos - the halo occupation distri- 
bution or HOD. Combined with the theoretically predicted 
spatial distribution of halos from e.g. N-body simulations, a 
specified HOD makes strong predictions about a wide array 
of galaxy clustering statistics. The formalism is now widely 
used in the interpretation of galaxy clustering and to infer 
cosmological parameters from large-scale galaxy surveys. 

Much of the recent work on fitting HODs has used the 
two-point galaxy correlation function as the observation of 
choice. While the galaxy correlation function provides very 
strong constraints on the HOD, there exist degeneracies 
between the inferred HOD and the underlying cosmology. 
Much of this degeneracy arises because a change in the cos- 
mology, and hence the halo population, can be compensated 
to a large extent by a change in the galaxy halo occupa- 
tion. This modified halo occupation apportions galaxies dif- 
ferently amongst halos of different mass than the fiducial 
model. Not surprisingly, combining the galaxy correlation 
function with a second observable with d ifferent sensitivities 
to th e HOD can lift such degeneracies (jZheng fc Weinbe"r3 
120071 ) tightening the constraints and allowing one to simul- 
taneously constrain the cosmological world model and HOD 
(see e.g. .Abazajian ct al. 2005). A number of such observ- 
ables have been considered in the literature. Galaxy-galaxy 
lensing has the potential to directly measure the mass of the 
halos hosting a galaxy population. Cross-correlating with 
another galaxy sample selected to live in high (or low) mass 



halos can help to break degeneracies, as can redshift space 
distortions (which are very sensitive to the satellite fraction) 
or peculiar velocities. Another observation is the abundance 
of rich clusters of galaxies, which can constrain the number 
density of massive halos. While all of these approaches are 
certainly valid, and will continue to be used in the future, a 
disadvantage is that they generally require additional obser- 
vations or measurements, with the associated modeling and 
additional systematics that must be calibrated. A natural 
question, therefore, is whether one can break these degen- 
eracies using only the data going into the clustering statistics 
themselves? 



Higher order c lustering measureme nts provide one 
such approach (e.g. iKulkarni et al.l |2007| ). A particularly 
convenient choice for our purposes is a marked correla- 
tion function, where the mark is determined from the 
galaxy spatial distribution. While this represents new in- 
formation, it is information which is readily available. 
We demonstrate here that ap propriately chosen marked 
two-point correlation functions (IBeisbart fc KerscheJ 



Beisbart. Kerscher. fc Meckel |2002|: ICottlober et al 

Sheth fc T ormen '2004'; 'Sheth. Conn ollv. fc Skibbal |2005| : 
Harker et a l. 2006; Wcchslcr ct al. 200^) can lift degenera- 
cies in the HOD or between the HOD and the cosmology. 
Such marked correlation functions are straightforward to 
compute with the same set of observations, and require no 
additional data nor understanding of the survey or algo- 
rithms beyond those required for basic clustering statistics. 
This paper serves as a proof-of-principle, by demonstrating 
that the degeneracy between HOD and the amplitude of 
the primordial fiuctuation spectrum is broken with a simple 
density mark in simulations. 
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Figure 1. (Top) The best-fit correlation function from our two 
cosmologies with = 0.8 (open squares) and eg = 1.0 (open tri- 
angles) along with the input 'data' (solid line). The line extends 
only over the range of the fit, and errors (between 5-10%) are sup- 
pressed for clarity. (Bottom) The ratio of the erg = 1 correlation 
function to that of erg = 0.8. Note that ^(r) for the two cosmolo- 
gies is almost identical, and well within our assumed errors. 




Figure 2. The HODs for the two cosmologies: erg = 0.8 (shaded 
region) and erg = 1.0 (open triangles). The error bars and width 
of the shaded region show the standard deviation from elements 
of a Markov-chain Monte-Carlo. The HODs differ by ^ 2cr at high 
mass. 



using an HOD prescription. We use a halo model which dis- 
tinguishes between central and satellite galaxies with a mean 
occupancy of halos: N{M) = (A'gai(Mhaio)). Each halo ei- 
ther hosts a central galaxy or does not, while the number 
of satellites is Poisson distributed about a mean A^aat. We 
parameterize N(M ) — Nccn + Nsa^t with 5 parameters (e.g. 
IZheng et al1l2005l ) 



A^ccn(M) = - erfc 



ln(Mcut/M) 



V2a 



and 



t(M) = ( 



M - kM^, 
Ml 



(1) 



(2) 



for Af > Afcut and zero otherwise. Different functional forms 
have been proposed in the literature, but the current form 
is flexible enough for our purposes here. 

The fiducial galaxy sample is generated from the 
erg = 0.8 simulation. It has a number density of 1.5 x 
10"^ Mpc~^ and a correlation length of about 7/i"^Mpc. 
All errors are computed by Monte-Carlo methods, divid- 
ing the simulation into disjoint regions. For definiteness 
we consider a survey of volume (250 /i~^Mpc)'^ ~ 1.6 x 
10^ h~^Mpc^, similar to the corresponding Sloan Digital Sky 
Survey sample, and scale the covariance matrices to that vol- 
ume. This yields diagonal errors on ^(r) of around 5 — 10% 
and bin-to- bin correlation of 15-80%. When fitting HOD 
models to these data the best fits are "good" fits, and the 
parameter values are well within the range of HOD parame- 
ters seen for similar galaxy samples, and so both cosmologies 
are acceptable a priori. 

It is clear (Fig. [TJ that the two-point correlation func- 
tion by itself cannot distinguish between the two models - 
Ax^ < 1 for 8 data points. The next sections demonstrate 
that a simple density mark, measurable from the spatial dis- 
tribution of galaxies strongly discriminates between these 
models. 



2.2 Marked correlation functions 

The marked correlation function generalizes the stan- 
dard correlation function by weighting galaxies by a nu- 
merical "mark". If the mark of the i^^ object is rrii, 
then the marked correlation fu nction is defined as (e.g. 
ISheth. Connolly, fc Skibball2005l , Eq. 3) 



M(r) 



1 



n{r)m'' 



■ ^ mirrij, 



(3) 



2 A WORKED EXAMPLE 
2.1 Degeneracies 

To illustrate our point we consider a mock galaxy sample, 
with characteristics similar to an L* sample, at z ~ 0.1, 
and try to analyze this in two cosmologies that only dif- 
fer in the normalization of the primordial power spectrum: 
as = 0.8 and 1.0. In both cosmologies a good fit to the 2- 
point function can be found (Fig. [T]), but the HOD differs 
(Fig. [2]) because the halo mass function is different in the 
two cosmologies. 

The fiducial galaxy sample was generated, and the fits 
were done, by populating N-body simulations with galaxies 



where the sum is over all pairs of objects (i, j) with sepa- 
ration rij = r, n{r) is the number of pairs, and the mean 
mark, m, is calculated over all objects in the sample. Note 
that, unlike ^, Wp or w, no random catalog is needed in the 
computation of M{r). It is convenient to divide out the clus- 
tering of the average sample since M{r) 7^ 1 then implies a 
difference in clustering by objects with different marks. The 
above expression can be applied in 2D or 3D, with angular 
or linear bins. 



^ We use the full covariance matrix when quoting significance 
levels. 
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Figure 3. The marked correlation function, M{r), for two HODs 
which provide good fits to ^(r) in the two cosmologies, trg = 0.8 
(open squares) and ag = 1.0 (open triangles). The mark is p/ (p* + 
p) with p determined by kernel estimation using 4 neighbors and 
p* = 25 p (see text). The error bars show the diagonal of the 
covariance matrix computed for the fiducial model. When M{r) 
deviates from 1, the clustering is sensitive to the (local) density: 
we see small scale clustering is enhanced in regions of high density 
(as expected) and by different amounts in the two models. 

The choice of mark depends on the apphcation. In the 
example above, we would like a mark that encodes informa- 
tion about which halos host which galaxies. One set of such 
marks would be functions of the local density, which can be 
computed in a number of ways, e.g. the distance to the n**" 
nearest neighbor, the number of neighbors within a fixed 
metr ic aperture, spline kernel interpolation (e.g . iDehnenI 
I2OOII ) or kernel deprojection (e.g. Eisenstein 2 0031 ). Massive 
halos tend to host more galaxies with higher density than 
lower mass halos, so if the HOD is changed we expect the 
density-marked correlation function to differ. The relation 
of local density to the number density of groups or clusters 
(wh ich are known to break degeneracies in model fitting, 
e.g. IZheng fc Weinberg||2007l ) can be complex, but is easily 
calculable from a mock catalog. 

What function of p should we choose as our mark? The 
choice is arbitrary, but p"'/{p" + p") has the nice property 
that it tends to zero for p ^ and unity for p^ pi,, the 
rapidity of the transition being controlled by n. This means 
the dynamic range in the mark is limited, which leads to 
more stable results. If we are concerned that our density 
estimator may be noisy, which is often true in practice, we 
should choose a low value of n. Hereafter we choose n = 1. 



2.3 Different HODs, different marks 

We now measure the local-density marked correlation func- 
tion for our two examples HODs. To begin we imagine that 
we can use spectroscopy or multi-band photometry to se- 
lect a sample of galaxies in a slice ±50/i~^Mpc. At our 
fiducial z ~ 0.1, or x* — 300 /i~^Mpc, this corresponds to 
A2/(l -\- z) ~ 15%. In this 2D slice we estimate the density 
using spline kernel interpolation with 4 nearest (in projec- 
tion) neighbors. Not surprisingly, we find that this density is 
much higher for objects which live in massive halos than for 
those which live in smaller halos. As the width of the slice is 



increased the contrast in density between high and low mass 
halos is reduced, but the trend remains the same. Note that 
our goal is not to optimize the density estimator, but to 
demonstrate that a useful estimator may be computed even 
for samples with limited redshift information. Of course, the 
exact choice of estimator will depend on the data set being 
considered. 

To pick a reasonable value of pi, we note that halos of 
lO^'^ H-'^Mq host 0(10) galaxies in our models and cover ~ 
5 (/i~^Mpc)^ in projection. Projected over ±50/i"^Mpc the 
background density is ~ 0.1 (/i~^Mpc)~^, so massive halos 
are ~ 20 more dense than the mean (p). We pick pi, = 25 p 
as a convenient round number, though our conclusions are 
not sensitive to the exact choice. 

Figure |3] shows that this marked correlation function 
on sub-Mpc scales is different for our two samples refiect- 
ing the differences in the HOD (Fig. [2}. How discrimina- 
tory is this measure? For our fiducial volume, Ax^ — 33 
for the two marked correlation functions, compared with 
Ax^ < 1 for the unweighted correlations. Since almost all 
of the difference comes from the lowest 4 data points, the 
two models can be strongly discriminated (> 99% assum- 
ing Gaussian errors). The distribution of the marks is al- 
most the same in the two samples, and the difference in 
M(r) remains even if we rescale the marks in one model to 
match the distribution in the other, showing that the dif- 
ference is robust. We also note that the relevant measure of 
error for Af(r) comes from the Monte-Carlo estimation of 
the covariance matrix. Simply scrambling the marks allows 
us to test for a density dependence of the correlation func- 
tion (ISheth. Connolly. \fc Skibbal l2005') . which is detected in 
all of our catalogs at extremely high significance, but does 
not tell us how to compare different M(r) to each other. 

Our initial choice of slice width, ±50/i~^Mpc, was pos- 
sibly optimistic for surveys at higher redshift. As we increase 
the width of the slice the density contrast decreases and 
the significance by which we can differentiate the models is 
also decreased. For a slice ±125/i~^Mpc in width, i.e. the 
full depth of our fiducial (250/i"^Mpc)^ survey, using the 
same mark as above, the two models in Figure [3] differ by 
Ax^ = 19. It is easily conceivable that a different choice of 
pi, or a higher power of p in the mark could increase the dis- 
criminatory power of M{r), but this is already reasonably 
significant given that only the 4 points with r < l/i~^Mpc 
contribute. If it proves impossible to select slices as thin as 
±125/i"^Mpc for some particular sample, it is always possi- 
ble to jointly analyze samples using cross-correlation marks 
where one of the samples can be well isolated in distance 
(e.g. red galaxies with strong breaks). Such statistics would 
need to be analyzed on a case-by-case basis. 



3 CONCLUSIONS 

Although the galaxy two-point correlation function has 
proved to be extremely useful in modeling the relationship 
between galaxies and dark matter, it does not exhaust the 
information in the data. One degeneracy that cannot be bro- 
ken by the correlation function alone is between the HOD 
and cosmology - within the context of the correlation func- 
tion, one is free to re-apportion galaxies to compensate for 
differences in the halo mass function. The number of such 
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degeneracies will only increase as we attempt more detailed 
mappings of galaxies to dark matter in the future. This note 
provides proof of principle that such degeneracies can be 
lifted by marked correlation functions. 

An important advantages of marked correlations is that 
they do not involve multiple data sets. Indeed, for the exam- 
ple presented here, the mark required no additional informa- 
tion beyond the spatial distribution of galaxies (and survey 
mask), which one needed to compute the correlation func- 
tion in the first place. Using the same data set considerably 
simplifies any modeling step. Furthermore the marked cor- 
relation function can be computed with the same code and 
at the same time as the standard correlation function, for 
which optimized algorithms exist. This is a non-negligible 
advantage when one considers the need to repeatedly com- 
pute it for mock samples while modeling, estimating errors, 
etc. 

Rapid advances in computational power and algorithm 
development have made it reasonably straightforward to 
simulate the distribution of dark matter halos in large vol- 
umes for almost any cosmological model. Combined with a 
halo occupation approach this makes "forward modeling" of 
almost any galaxy statistic possible. This implies that statis- 
tics which use all of the galaxy information, in the manner 
of standard large-scale structure n-point statistics, can be 
just as useful as those which try to identify special subsets of 
galaxies ('cluster' vs. 'field'). As our ability to model a wider 
array of observations (e.g. weak lensing, Sunyaev-Zel'dovich 
decrement or X-ray flux) matures, similar methods can be 
applied to these observations, bypassing the need to relate 
particular features in a map with individual 3D structures. 
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